Inference  of  Multicast  Routing  Trees  and  Bottleneck  Bandwidths 

using  End-to-end  Measurements 


Sylvia  Ratnasamy  and  Steven  McCanne 


Report  No.  UCB/CSD-98-1019 

October  1998 

Computer  Science  Division  (EECS) 
University  of  California 
Berkeley,  California  94720 


Work  supported  by  DARPA  Grant 
66001-96-C-8508  and  the 
California  State  MICRO  Program. 


Report  Documentation  Page 


Form  Approved 
OMB  No.  0704-0188 


Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 


1.  REPORT  DATE 

OCT  1998 


2.  REPORT  TYPE 


4.  TITLE  AND  SUBTITLE 

Inference  of  Multicast  Routing  Trees  and  Bottleneck  Bandwidths  using 
End-to-end  Measurements 

6.  AUTHOR(S) 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

University  of  California  at  Berkeley, Department  of  Electrical 
Engineering  and  Computer  Sciences, Berkeley, CA, 94720 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 


3.  DATES  COVERED 

00-00-1998  to  00-00-1998 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

5d.  PROIECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


10.  SPONSOR/MONITOR'S  ACRONYM(S) 

11.  SPONSOR/MONITOR'S  REPORT 
NUMBER(S) 


12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

14.  ABSTRACT 

The  efficacy  of  end-to-end  multicast  transport  protocols  depends  critically  upon  their  ability  to  scale 
efficiently  to  a  large  number  of  receivers.  Several  research  multicast  protocols  attempt  to  achieve  this  high 
scalability  by  identifying  sets  of  co-located  receivers  in  order  to  enhance  loss  recovery,  congestion  control 
and  so  forth.  A  number  of  these  schemes  could  be  enhanced  and  simplified  by  some  level  of  explicit 
knowledge  of  the  topology  of  the  multicast  distribution  tree,  the  value  of  the  bottleneck  bandwidth  along 
the  path  between  the  source  and  each  individual  receiver  and  the  approximate  location  of  the  bottlenecks 
in  the  tree.  In  this  paper,  we  explore  the  problem  of  inferring  the  internal  structure  of  a  multicast 
distribution  tree  using  only  observations  made  at  the  end  hosts.  By  noting  correlations  of  loss  patterns 
across  the  receiver  set  and  by  measuring  how  the  network  perturbs  the  fine-grained  timing  structure  of  the 
packets  sent  from  the  source,  we  can  determine  both  the  underyling  multicast  tree  structure  as  well  as  the 
bottleneck  bandwidths.  Our  simulations  show  that  the  algorithm  is  robust  and  appears  to  converge  to  the 
correct  tree  with  high  probability. 

15.  SUBIECT  TERMS 


16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 

18.  NUMBER 

19a.  NAME  OF 

ABSTRACT 

OF  PAGES 

RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Same  as 
Report  (SAR) 

13 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


Inference  of  Multicast  Routing  Trees  and  Bottleneck  Bandwidths 

using  End-to-end  Measurements 

Sylvia  Ratnasamy  and  Steven  McCanne 
October  1998 


Abstract 

The  efficacy  of  end-to-end  multicast  transport  pro¬ 
tocols  depends  critically  upon  their  ability  to  scale 
efficiently  to  a  large  number  of  receivers.  Several 
research  multicast  protocols  attempt  to  achieve  this 
high  scalability  by  identifying  sets  of  co-located  re¬ 
ceivers  in  order  to  enhance  loss  recovery,  congestion 
control  and  so  forth.  A  number  of  these  schemes 
could  be  enhanced  and  simplified  by  some  level  of 
explicit  knowledge  of  the  topology  of  the  multicast 
distribution  free,  the  value  of  the  bottleneck  band¬ 
width  along  the  path  between  the  source  and  each 
individual  receiver  and  the  approximate  location  of 
the  bottlenecks  in  the  tree.  In  this  paper,  we  explore 
the  problem  of  inferring  the  internal  structure  of  a 
multicast  distribution  tree  using  only  observations 
made  at  the  end  hosts.  By  noting  correlations  of 
loss  patterns  across  the  receiver  set  and  by  measur¬ 
ing  how  the  network  perturbs  the  fine-grained  tim¬ 
ing  structure  of  the  packets  sent  from  the  source, 
we  can  determine  both  the  underlying  multicast  tree 
structure  as  well  as  the  bottleneck  bandwidths.  Our 
simulations  show  that  the  algorithm  is  robust  and 
appeal's  to  converge  to  the  correct  free  with  high 
probability. 

1  Introduction  and  Motivation 

The  IP  Multicast  service  provides  for  efficient  one- 
to-many  packet  transmission.  A  single  packet  trans¬ 
mitted  by  the  source  is  delivered  to  an  arbitrary 
number  of  receivers  by  replicating  the  packet  within 
the  network  at  fan-out  points  along  a  distribution 
free  rooted  at  the  traffic’s  source  [11].  The  IP  Multi¬ 
cast  service  model  provides  a  best-effort  service,  yet 


a  number  of  emerging  applications,  such  as  shared 
white-boards,  software  updates,  news  articles  etc  re¬ 
quire  reliable  packet  delivery.  To  meet  this  require¬ 
ment,  reliable  multi-cast  protocols  such  as  SRM  [4], 
RMTP  [9],  and  TMTP  [20]  build  reliability  on  top 
of  this  unreliable  service. 

A  key  challenge  in  the  design  of  a  reliable  mul¬ 
ticast  protocol  is  its  loss  recovery  algorithm,  which 
has  proven  difficult  to  scale  to  a  large  number  of 
receivers.  For  example,  the  global  loss  recovery 
component  of  SRM  multicasts  retransmission  re¬ 
quests  and  replies  to  the  entire  group  and  thus  scales 
poorly  [16]  as  the  entire  free  participates  in  the 
recovery  process  and  even  a  single  lossy  receiver 
can  significantly  degrade  the  overall  session  perfor¬ 
mance.  To  solve  this  problem  a  number  of  schemes 
have  been  proposed  that  try  to  restrict  error  recov¬ 
ery  traffic  to  the  required  scope,  i.e.,  these  schemes 
attempt  to  achieve  local  recovery.  The  key  idea  be¬ 
hind  local  recovery  is  to  identify  loss  neighborhoods 
of  receivers  that  share  similar  loss  patterns  and  con¬ 
fine  error  recovery  to  this  neighborhood  without  dis¬ 
turbing  the  rest  of  the  free.  Schemes  based  on  this 
approach  include: 

•  the  use  of  hop-scoping  to  control  the  dis¬ 
tance  travelled  by  retransmission  requests  and 
replies  [10]; 

•  the  use  of  separate  local  multicast  groups  for 
error  recovery  [10]; 

•  replier-based  schemes,  based  on  a  new  set  of 
router  forwarding  services  such  as  directed 
multicast  and  subcast  forwarding  [13];  and, 

•  the  use  of  a  new  “randomcast”  forwarding  ser¬ 
vice  to  form  “search  parties”  of  loss  affected 
members  searching  for  lost  data  [3]. 
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All  these  schemes  essentially  require  that  the  re¬ 
ceiver  discover  the  loss  recovery  group  to  which  it 
belongs  and  search  for  potential  candidates  to  re¬ 
transmit  lost  packets. 

The  Reliable  Multicast  Transport  Protocol  [9] 
attempts  to  solve  the  loss  recovery  problem  by  or¬ 
ganizing  members  into  a  hierarchy.  Acknowledg¬ 
ments  arc  sent  not  to  the  source,  but  to  the  parent 
member  in  the  free.  Internal  nodes  in  the  hierarchy 
called  Designated  Receivers  (DRs)  cache  data  pack¬ 
ets  for  later  retransmission  of  lost  packets.  RMTP 
therefore  provides  both  implosion  avoidance  and  lo¬ 
cal  recovery.  However  for  RMTP  to  perform  well, 
the  hierarchy  of  members  must  be  very  closely  cor¬ 
related  to  the  underlying  multicast  distribution  tree 
and  DRs  need  to  be  optimally  and  dynamically  dis¬ 
tributed  over  the  free.  How  this  can  be  achieved  is 
still  an  open  research  problem. 

The  recent  work  on  Self-Organized  Transcod¬ 
ing  (SOT)  [7]  hies  to  adapt  continuous-media  appli¬ 
cations  to  varying  network  conditions  through  the 
use  of  self  organized  transcoding.  In  SOT,  when  a 
group  of  co-located  receivers  detects  loss  caused  by 
a  congested  link,  an  upstream  receiver  with  better 
reception  at  the  far  end  of  the  bottleneck  acts  as  a 
transcoder  and  provides  a  customized  version  of  the 
stream.  A  new  stream  is  multicast  to  a  new  address 
and  receivers  adversely  affected  by  the  bottleneck 
switch  to  the  new  group.  Receivers  use  the  observed 
loss  patterns  to  decide  when  to  switch  groups.  Since 
it  is  crucial  to  the  stability  of  the  protocol  that  all  re¬ 
ceivers  within  the  same  loss  subtree  switch  groups 
together,  decision  errors  regarding  joining  and  leav¬ 
ing  groups  need  to  be  minimized.  The  problem  of 
knowing  when  to  join  and  leave  groups  and  know¬ 
ing  which  group  to  join  is  equivalent  to  the  prob¬ 
lem  of  knowing  which  loss  neighborhood  a  receiver 
belongs  to.  The  problem  of  optimally  placing  a 
transcoder  or  a  designated  receiver  is  essentially  the 
problem  of  determining  which  receiver  would  be  an 
ideal  candidate  for  retransmitting  lost  packets. 

In  each  of  the  schemes  outlined  above  (i.e.,  lo¬ 
cal  recovery,  RMTP,  and  SOT),  the  receiver’s  pro¬ 
tocol  could  be  enhanced  and  potentially  simplified 
with  explicit  knowledge  of  the  underlying  multi¬ 
cast  distribution  free.  Unfortunately,  the  IP  service 
model  deliberately  hides  this  information  in  favor 
of  a  universal  packet  service  that  is  easily  ported 


across  diverse  technologies  and  environments.  To 
overcome  this,  protocols  like  TCP  adapt  to  physical 
path  characteristics  through  end-to-end  adaptation 
(e.g.,  searching  for  the  bottleneck  bandwidth  with 
slow  start  and  adapting  to  changes  in  available  ca¬ 
pacity  with  its  congestion  avoidance  mode).  But  un¬ 
like  unicast  TCP,  multicast  communication  creates 
many  paths  between  a  source  and  its  receivers  with 
potentially  heterogeneous  characteristics.  Conse¬ 
quently,  researchers  have  devised  schemes  like  lo¬ 
cal  recovery  and  SOT  to  discover  the  homogenous 
sub-regions  of  a  heterogeneous  multicast  distribu¬ 
tion  free,  and  exploit  this  knowledge  in  the  adapta¬ 
tion  processes. 

In  this  paper,  we  propose  a  scheme  for  deriving 
a  fairly  accurate  picture  of  the  topology  of  a  multi¬ 
cast  distribution  tree  strictly  from  end-to-end  obser¬ 
vations.  Our  approach  relies  upon  complete  infor¬ 
mation  of  loss  statistics  at  every  receiver  and  thus 
is  not  a  practical  protocol  building  block  in  its  own 
right.  However,  we  believe  the  process  of  explor¬ 
ing  this  extreme  point  sheds  light  on  the  difficulty 
of  the  problem  and  forms  the  foundation  for  follow- 
on  work  that  could  exploit  variations  in  the  basic 
approach  to  trade  off  computational  overhead  for 
topological  accuracy.  Even  so,  our  scheme  could 
be  of  potential  use  in  its  current  form  for  network 
monitoring,  debugging,  and  performance  character¬ 
ization  using  off-line  processing. 

Our  approach  to  this  topology  discovery  prob¬ 
lem  consists  of  two  core  pieces:  a  free  inference  al¬ 
gorithm  and  bottleneck  bandwidth  estimator.  The 
free  inference  engine  clusters  nodes  according  to 
shared  loss  and  estimates  the  free  according  to  a 
probabilistic  model  that  eliminates  “false  sharing”. 
The  estimate  converges  to  the  true  free  as  more  loss 
statistics  arc  collected.  We  combine  this  topolog¬ 
ical  information  with  a  bottleneck  bandwidth  esti¬ 
mation  technique  in  order  to  approximate  the  loca¬ 
tion  of  the  bottlenecks  in  the  free.  The  result  is  a 
model  that  faithfully  captures  the  link  capacities  and 
multicast  topology  of  the  underlying  physical  tree 
even  though  our  algorithms  require  information  that 
is  easily  available  at  the  end  hosts  and  work  with  the 
existing  multicast  routing  service. 

In  the  next  section,  we  describe  the  bottle¬ 
neck  bandwidth  estimation  technique.  Section  3  de¬ 
scribes  the  free  inference  algorithm.  In  Section  4, 


Figure  1:  Packet  pairs  flowing  through  a  bottleneck 
link.  The  vertical  dimension  is  bandwidth,  horizon¬ 
tal  dimension  is  time 

the  two  algorithms  arc  combined  into  a  compre¬ 
hensive  algorithm  that  approximates  link  capacities 
from  the  bottleneck  measurements.  Implementa¬ 
tion  details  and  preliminary  test  results  are  in  Sec¬ 
tion  5.  Finally,  we  describe  related  work  on  bottle¬ 
neck  bandwidth  estimation  and  path  inference  tech¬ 
niques,  and  conclude. 

2  Bottleneck  Bandwidth  Estima¬ 
tion 

Transmission  of  a  packet  from  a  source  to  a  re¬ 
ceiver  involves  forwarding  the  packet  along  a  series 
of  consecutive  links.  Each  link  has  a  maximum  rate 
at  which  it  can  forward  packets.  The  maximum  rate 
of  the  slowest  link  along  the  chain  determines  the 
maximum  rate  at  which  data  can  be  transmitted  be¬ 
tween  the  source  and  receiver.  In  other  words,  the 
slowest  link  sets  the  bottleneck  bandwidth  along  a 
given  path.  The  ability  to  measure  this  bottleneck 
bandwidth  value  stems  from  the  observation  that  as 
a  packet  is  transmitted  along  a  link,  it  is  “spaced” 
out  in  time  depending  on  the  transmission  rate  of 
the  link  with  the  amount  of  spacing  being  inversely 
proportional  to  the  capacity  of  the  link  [5],  The 
basic  idea  behind  the  packet-pair  mechanism  is  as 
follows:  if  two  probe  packets  travel  together  such 
that  they  arc  adjacent  at  the  bottleneck  link,  with  no 
packets  intervening  between  them  then,  on  emerg¬ 
ing  from  the  bottleneck  link  the  inter-packet  spacing 
will  be  proportional  to  the  transmission  time  of  the 
first  packet  over  the  bottleneck.  This  can  be  seen  in 
Figure  1. 

Let  (jh  seconds  be  the  time  required  to  forward 


a  packet  of  length  P  bytes  through  the  bottleneck 
link.  If  the  bottleneck  bandwidth  is  B (bytes /  s) 
then  Q),  =  P/B.  Q j,  can  be  approximated  at  the 
receiver’s  end.  The  problem  that  then  arises  is  that 
queuing  elements  beyond  the  bottleneck  can  distort 
the  spacing  between  the  probe  packets.  Either  the 
first  or  the  second  packet  can  be  randomly  delayed 
thus  randomly  increasing  or  decreasing  the  calcu¬ 
lated  estimate  of  the  bottleneck  bandwidth.  These 
random  variations  can  be  viewed  as  noise  affect¬ 
ing  the  consistent  inter-packet  spacing  caused  by  the 
bottleneck.  Filtering  mechanisms  arc  thus  needed  to 
extract  the  desired  measurements. 

2.1  Filtering  algorithm  for  robust  bottle¬ 
neck  bandwidth  estimation 

In  [15],  Paxson  develops  a  robust  algorithm  called 
Packet  Bunch  Mode  (PBM)  that  estimates  the  bot¬ 
tleneck  bandwidth  along  a  unicast  path.  Our  fil¬ 
tering  algorithm  is  adopted  from  Paxson’s  work  on 
PBM.  In  this  section,  we  briefly  review  our  filter¬ 
ing  techniques.  A  more  in-depth  description  of  the 
details  of  PBM  and  the  selection  of  the  appropriate 
values  for  the  required  parameters  can  be  found  in 

[15]. 

Probe  packets  transmitted  by  the  sender  in¬ 
clude  a  sequence  number,  and  a  time-stamp  indicat¬ 
ing  the  transmission  time.  The  packet’s  arrival  time 
is  noted  at  the  receiving  end.  Inter-packet  spacing 
measurements  are  made  by  recording  the  difference 
in  arrival  times  A Tr  between  consecutive  packets. 
The  difference  in  transmission  times,  A Ts ,  is  calcu¬ 
lated  from  the  packet  time-stamps.  The  criteria  used 
to  select  valid  sample  measurements  are: 

•  We  define  an  expansion  factor  £  which  mea¬ 
sures  the  factor  by  which  the  packets  were 
spread  out  by  the  network  as: 

£  =  AT,./ATs 

If  £  <  1.0,  then  the  packets  were  not  spread 
out  by  the  network  and  hence  not  shaped  by 
the  bottleneck.  Thus,  calculations  based  on 
their  arrival  times  should  not  be  used  in  esti¬ 
mating  the  bottleneck  and  are  not  accepted  as 
valid  samples. 


•  If  the  last  packet  pair  we  inspected  yielded 
a  valid  sample  and  spanned  an  interval  of 
AT'  then  we  perform  a  heuristic  test:  If 
AT,, /AT,'  >  2  then  the  current  pair  was 
spaced  out  more  than  twice  as  much  as  the  pre¬ 
vious  pair  and  we  skip  the  current  pair  as  it  is 
likely  to  reflect  sporadic  arrivals. 

•  Pairs  that  include  out  of  order  arrivals  or  lost 
packets  arc  rejected. 

Samples  meeting  the  above  criteria  arc  used  to 
calculate  a  set  of  bottleneck  estimates.  Let  N j,  be 
the  number  of  estimates  obtained.  If  N  is  the  total 
number  of  packets  sent,  then  N/2  is  the  maximum 
number  of  possible  estimates  using  packet  pairs.  If 
Nf,  is  less  than  70%  of  N/2  then  we  reject  further 
analysis  of  the  set  of  estimates  as  it  consists  of  too 
few  estimates.  Otherwise,  we  turn  to  the  problem 
of  extracting  the  best  estimate  from  the  set.  The 
set  of  estimates  is  first  sorted  in  decreasing  order 
of  the  frequency  of  their  occurrence.  Let  X  be  the 
estimate  that  occurs  with  maximum  frequency.  We 
then  search  the  set  for  values  that  fall  within  ±5% 
of  X  and  combine  them  as  a  single  entry  with  value 
X  and  frequency  equal  to  the  sum  total  of  the  in¬ 
dividual  frequencies.  The  set  of  estimates  is  thus 
narrowed  down  to  a  set  of  disjoint  ranges.  The  es¬ 
timate  that  occurs  with  the  maximum  frequency  is 
then  selected  as  the  bottleneck  bandwidth  provided 
it  occurs  with  a  frequency  that  exceeds  all  other  es¬ 
timates  by  at  least  60%.  If  not,  the  results  obtained 
from  the  set  of  samples  is  ambiguous  and  no  esti¬ 
mate  of  the  bottleneck  bandwidth  is  made. 

The  above  bottleneck  estimation  algorithm  can 
be  extended  to  step  through  an  increasing  series  of 
packet  bunch  sizes  as  outlined  in  [15]  in  order  to 
detect  multi-channel  bottlenecks  and  changing  bot¬ 
tleneck  bandwidths. 

2.2  Estimation  of  the  bottleneck  band¬ 
width  in  a  multicast  tree 

To  apply  the  techniques  described  in  Section  2.1  , 
the  traffic  source  in  the  multicast  tree  transmits  a 
stream  of  back-to-back  probe  pairs.  Each  receiver 
measures  the  arrival  times  of  packets  at  its  end  and 
uses  the  filtering  algorithm  outlined  in  Section  2. 1  to 
infer  the  bottleneck  bandwidth  of  the  path  between 
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Figure  2:  Original  tree  topology 

itself  and  the  source.  Note  that  this  method  does 
not,  by  itself,  in  any  way  indicate  where  along  the 
path  the  bottleneck  is  located. 

3  Tree  Inference  Algorithm 

This  section  describes  our  free  inference  algorithm 
which  reconstructs  a  logical  representation  of  the 
multicast  distribution  tree  using  information  ob¬ 
tained  from  the  losses  seen  by  the  receivers. 

Multicast  packets  flow  along  a  distribution  tree 
rooted  at  the  source.  The  receivers  form  the  leaves 
of  the  tree,  the  routers  arc  the  internal  nodes  in  the 
free  and  the  links  form  the  edges  of  the  tree.  A 
packet  that  is  dropped  along  any  link  of  the  distri¬ 
bution  free,  is  lost  by  all  the  downstream  receivers 
in  the  subtree  rooted  at  the  link.  The  tree  structured 
delivery  model  thus  introduces  correlations  in  the 
packet  losses  seen  by  the  different  receivers.  This 
loss  correlation  between  receivers  can  be  exploited 
to  infer  the  topology  of  the  tree  that  caused  the  ob¬ 
served  loss  patterns. 

Our  algorithm  reconstructs  a  ‘logical’  repre¬ 
sentation  of  the  multicast  tree.  A  logical  represen¬ 
tation  of  a  multicast  free  is  one  in  which  each  in¬ 
terior  node  is  merely  the  closest  common  ancestor 
of  all  downstream  receivers  in  the  free  [18].  In 
reality  each  branch  of  the  logical  tree  could  con¬ 
sist  of  a  series  of  links.  In  order  to  learn  the  ex¬ 
act  topology  of  the  tree  we  would  have  to  enlist  the 
help  of  each  intermediate  router  along  the  path  as  is 
done  in  the  traceroute  and  mtrace  tools.  Our  algo- 


rithm  is  based  only  on  end-to-end  measurements  us¬ 
ing  only  information  that  is  readily  available  at  the 
end  hosts  and  requires  no  special  router  support,  as 
such,  reconstructing  a  logical  tree  is  as  accurate  as 
we  can  get.  Knowledge  of  the  logical  tree  is  how¬ 
ever  sufficient  for  our  puipose  because  all  the  re¬ 
ceivers  downstream  of  a  given  logical  branch  will 
see  the  same  path  characteristics  such  as  the  bottle¬ 
neck  bandwidth  and  loss  rate  irrespective  of  which 
component  link  of  the  logical  branch  caused  the  ob¬ 
served  characteristics. 

The  tree  inference  algorithm  described  in  the 
following  sections  attempts  to  reconstruct  this  log¬ 
ical  tree  in  a  bottom-up  fashion  using  information 
regarding  the  loss  patterns  of  the  different  receivers. 
Receivers  having  similar  loss  patterns  arc  aggre¬ 
gated  together  and  represented  by  a  single  node  one 
level  higher  in  the  tree. The  aggregated  nodes  can 
then  be  regarded  as  a  single  node  for  further  ag¬ 
gregation.  The  entire  tree  has  been  reconstructed 
when  all  the  receivers  have  been  coalesced  in  this 
manner  into  a  single  tree.  For  example:  In  order 
to  rebuild  the  tree  shown  in  figure  2,  the  algorithm 
initially  begins  with  a  set  of  individual  receivers  A, 
B  and  C.  Information  obtained  from  the  loss  pat¬ 
terns  of  the  three  receivers  indicates  that  A  and  B 
are  more  closely  located  than  A  and  C  or  B  and  C. 
We  thus  aggregate  A  and  B  into  a  single  “macro¬ 
node”  {AB).  Next,  ( A  B )  and  C  are  aggregated  to 
yield  the  logical  tree  ((AB)C). 


Application  of  the  aggregation  techniques  out¬ 
lined  above  requires  knowledge  of  two  things:  first, 
we  need  a  selection  criteria  that  is  indicative  of  how 
closely  located  receivers  arc  in  the  tree  and  sec¬ 
ond  we  need  to  know  how  many  receivers  arc  to 
be  aggregated  together  into  a  single  representative 
macro-node  at  each  step  of  the  tree  building  pro¬ 
cess.  We  first  develop  the  principles  behind  iden¬ 
tifying  a  pair  of  receivers  to  be  coalesced  at  each 
step  of  the  selection  process,  thus  yielding  a  binary 
tree  and  then  generalize  the  principles  to  reconstruct 
trees  with  arbitrary  fan-out  at  each  interior  node. 

3.1  Selection  Criteria 

We  associate  with  each  receiver  X,  a  lossprint  Lx 
which  is  an  ordered  listing  of  the  sequence  numbers 
of  packets  lost  by  the  receiver  X.  In  a  tree  any  two 
receivers  A  and  B  see  losses  as  described  by  their 
lossprints  La  and  Lf,  respectively.  These  lossprints 
could  potentially  have  a  certain  number  of  losses  in 
common.  We  call  these  common  losses  the  shared 
losses  between  receivers  A  and  B. 

3.1.1  Selection  criteria:  Shared  losses 

At  a  first  glance,  the  shared  losses  between  a  pair 
of  receivers  appeal's  to  be  an  ideal  indicator  of  how 
closely  located  the  receivers  are  in  the  underlying 
tree.  Net  shared  losses  can  however  be  misleading. 
For  example,  in  figure  4  consider  the  case  where  the 
link  f?2  —  A  has  a  high  loss  rate.  A  could  then  have  a 
high  number  of  losses  in  common  with  every  other 
receiver.  In  particular,  if  the  link  R  \  -  /?•;  has  a  low 
loss  rate  then,  it  is  possible  that  the  shared  losses 
between  A  and  C  exceed  those  between  A  and  B 
which  could  result  in  the  wrong  nodes  being  coa¬ 
lesced.  The  flaw  in  the  use  of  net  shared  losses  as 
selection  criteria  is  easily  understood  if  we  look  at 
the  ways  in  which  shared  losses  occur. 

Shared  losses  arise  in  two  ways.  A  pair  of  re¬ 
ceivers  A  and  B  share  the  path  from  the  root  to  their 
closest  common  ancestor.  Any  packets  lost  along 
this  shared  path  will  appeal'  in  the  lossprints  of  both 
A  and  B.  These  losses  are  caused  by  the  tree  struc¬ 
ture  and  are  truly  indicative  of  the  underlying  tree 
structure.  We  call  these  true  shared  losses.  In  ad¬ 
dition  to  these  true  shared  losses,  each  receiver’s 


Source 


Figure  4:  Selection  Criteria 

lossprint  will  also  include  the  packets  that  arc  lost 
along  the  separate  paths  from  the  closest  common 
ancestor  to  each  receiver.  It  is  possible  that  two 
copies  of  the  same  packet  arc  lost  independently 
along  these  distinct  paths  on  account  of  which  a  por¬ 
tion  of  the  shared  losses  between  A  and  B  arc  not 
caused  by  the  shared  path  between  A  and  B.  These 
shared  losses  arc  random  and  arc  not  caused  by  the 
underlying  free  structure.  We  call  these  false  shared 
losses.  The  failure  modes  that  arise  in  the  use  of  the 
net  shared  losses  as  selection  criteria  arc  due  to  this 
“false  sharing”. 

3.1.2  Selection  criteria:  True  shared  losses 

The  greater  the  extent  of  the  shared  path  between 
a  pair  of  receivers,  the  greater  is  the  probability  of 
their  seeing  true  shared  losses.  The  probability  of 
seeing  true  shared  losses  between  a  pair  of  receivers 
is  thus  a  good  measure  of  how  closely  located  re¬ 
ceivers  arc  in  the  free  and  we  use  this  probability  as 
the  selection  criteria  in  order  to  identify  the  pair  of 
receivers  to  be  coalesced. 

At  the  end  host,  there  is  nothing  that  distin¬ 
guishes  a  true  shared  packet  loss  from  a  false  one. 
The  receiver  merely  sees  the  net  shared  loss.  In  or¬ 
der  to  allow  the  receiver  to  estimate  the  approximate 
number  of  true  shared  losses  from  the  total  shared 
losses,  we  apply  the  following  loss  model: 

•  A  and  B  arc  arbitrary  receivers  from  the  set  of 
all  receivers  with  lossprints  La  and  Lb  respec¬ 
tively.  Let  n  be  the  total  number  of  packets 
transmitted  at  the  source. 

•  A  and  B  share  a  certain  extent  of  the  path  from 
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Figure  5:  Loss  model 

the  source.  Let  P*h  be  the  probability  of  seeing 
losses  along  this  path.  i.e.  P*b  is  the  probabil¬ 
ity  of  seeing  true  shared  losses  between  A  and 
B. 

•  Any  losses  seen  by  receiver  A  but  not  by  B 
occur  along  the  path  from  the  closest  common 
ancestor  of  A  and  B  and  the  receiver  A.  Let  Pa 
be  the  loss  probability  along  this  path.  Sim¬ 
ilarly  let  Pi,  be  the  loss  probability  along  the 
path  from  the  closest  common  ancestor  of  A 
and  B  to  B. 

The  above  model  is  represented  by  figure  5.  Using 
the  above  model,  we  can  derive  the  following  equa¬ 
tions: 

•  Let  the  probability  of  seeing  a  shared  loss 
(whether  true  or  false)  between  A  and  B  be 
i’oh-  Then, 

Pab  =  PL  +  (1  -  P^PaPb  (1) 

where  the  first  term  on  the  right  hand  side  is  the 
probability  of  seeing  true  shared  losses.  The 
second  term  is  the  probability  of  seeing  false 
shared  losses. 

•  Let  the  probability  of  seeing  a  loss  at  A  but  not 
at  B  be  Pab.  Then, 

Pab  =  i1  -  ptab)Pa{l  -  Pb)  (2) 

Similarly,  if  the  probability  of  seeing  a  loss  at 
B  but  not  at  A  is  Pbb .  then, 

Pb-a  =  (l-ptab)(l-Pa)Pb  (3) 


•  Solving  equations  (1)  to  (3)  yields  the  follow¬ 
ing  solutions: 


PgbPba  +  Pba.Pgb  +  Pgb  Pab  +  Pqb  Pc 
Pgb  +  Pba  +  P„),  ~  1 

Pgb _ 

1  —  ( Pba  +  Pab ) 

Pba 

1  “  (Pgb  +  Pab ) 


•  Let  the  number  of  measured  shared  losses  be¬ 
tween  A  and  B  be  |Laj|  where  Lab  =  La  n  L b. 
We  approximate  Pab  as  \Lr,b  \  /n. 

•  Similarly,  if  the  number  of  measured  losses 
seen  by  A  but  not  by  B  is  \Lab\,  we  approx¬ 
imate  Pai  as  \Lai |  jn.  We  approximate  l\,  as 
\Li&  \  /n.  As  n  increases,  these  approximations 
should  converge  to  the  true  value  of  the  defined 
probabilities. 

3.2  Binary  Trees 

A  binary  free  is  one  in  which  every  interior  node  in 
the  free  has  at  most  two  children.  As  we  coalesce  a 
pair  of  receivers  together  at  every  step  our  algorithm 
reconstructs  a  logical  binary  tree  in  which  every  in¬ 
terior  node  has  exactly  two  children. 

Using  the  selection  criteria  defined  in  Section 
3.1,  the  tree  inference  algorithm  works  as  follows: 
Input :  A  set  of  receivers  S  =  (1.2. ....  N)  with 
lossprints  L\,  L2,  ■■■,  Ln. 

1.  Compute  the  probability  of  seeing  true  shared 
losses  between  all  pairs  of  receivers  from  the 
set  S. 

2.  The  pair  of  receivers,  A  and  B,  with  the  max¬ 
imum  probability  of  seeing  true  shared  losses 
are  combined  together  into  a  single  macro¬ 
node  {AB).  Set  L(ab)  =  La  fl  Lb  and  replace 
A  and  B  by  (AB)  in  S. 

3.  Repeat  the  above  steps  until  all  the  receivers  in 
S  have  been  fused  into  the  free. 

Our  free  inference  algorithm  employs  a  greedy 
strategy  of  making  the  most  likely  merger  at  ev¬ 
ery  step.  Our  results  indicate  that  such  a  strategy 


Figure  6:  Possible  relationships  between  a  pair  of 
nodes  to  be  coalesced  for  arbitrary  trees 


works  well  in  practice.  Future  work  could  look  into 
algorithms  that  consider  correlations  across  multi¬ 
ple  nodes.  In  recent  work,  [19]  compare  the  per¬ 
formance  of  top-down  and  bottom-up  clustering  al¬ 
gorithms  for  the  reconstruction  of  the  logical  free 
topology.  Their  results  indicate  that  a  bottom-up  ap¬ 
proach  yields  better  results. 

3.3  Arbitrary  tree  topologies 

In  the  binary  trees  reconstructed  in  the  previous  sec¬ 
tion,  each  interior  node  has  a  fan-out  of  exactly  2. 
As  such,  the  pair  of  nodes  yielded  by  the  selection 
criteria  are  always  aggregated  as  sibling  nodes  and 
represented  by  their  parent  node  for  further  aggre¬ 
gation.  In  an  arbitrary  tree  topology,  interior  nodes 
have  a  fan-out  of  two  or  more.  The  selected  pair  of 
nodes  can  thus  be  aggregated  either  as  sibling  nodes 
as  in  the  case  of  binary  frees,  or  one  of  the  selected 
nodes  could  be  the  parent  node  of  the  other.  The  two 
alternatives  can  be  seen  in  the  aggregation  of  node 
C  and  macro-node  (AB)  in  figure  6. 

The  ability  to  distinguish  between  these  two 
cases  stems  from  the  observation  that  in  case  1 
the  probability  of  seeing  true  shared  losses  between 
A  and  B  should,  under  ideal  circumstances,  equal 
the  probability  of  seeing  true  shared  losses  between 
macro-node  (AB)  and  node  C  i.e  P*ab}c  =  P^b- 
In  case  2  the  probability  P*ab  will  be  greater  than 
Pj  , ,  because  A  and  B  share  an  additional  link 

( ao)c 


not  shared  by  C.  This  added  link  adds  to  the  true 
shared  losses  between  A  and  B  on  account  of  which 
P'ifj  >  We  could  thus  distinguish  between 

the  two  subtrees  by  making  the  following  check  : 

If  Pf  =  Pi.  i  then  the  nodes  arc  coalesced  as 

[ab)c  ab 

in  case  1  else  the  nodes  arc  coalesced  as  in  case  2. 

In  reality,  since  we  use  the  measured  losses  in 
order  to  approximate  the  probabilities  P„b,  Pnl)  and 
PJi)n ,  the  equality  criteria  for  case  1  arc  too  rigid. 
Strict  adherence  to  the  above  rules  would  result  in 
incorrect  aggregations.  In  order  to  accommodate  a 
certain  amount  of  variation,  we  would  like  to  iden¬ 
tify  situations  in  which  P*ahy  “almost”  equals  P*b. 
We  thus  define  an  error  margin  a  and  modify  the 
above  rules  to  : 

If  P'ah  ,r  is  within  a%  of  P*ab  then  the  nodes  arc 
coalesced  as  in  case  1  else  the  nodes  are  coalesced 
as  in  case  2. 

This  decision  rule  could  result  in  incorrect  ag¬ 
gregations  being  made  for  subtrees  as  in  case  2  if 
the  additional  true  shared  losses  between  A  and  B 
arc  responsible  for  less  than  a%  of  the  probability 
of  seeing  true  shared  losses  between  A  and  B.  As  a 
will  typically  be  low,  such  errors  will  only  occur  if 
the  loss  rate  along  a  shared  link  is  very  low.  As  the 
puipose  of  these  aggregations  is  to  identify  nodes 
that  can  be  grouped  together  for  the  puipose  of  local 
loss  recovery  etc,  such  aggregations  although  not 
exact  arc  actually  acceptable  because  the  low  loss 
rate  link  is  not  the  bottleneck  causing  loss,  the  prob¬ 
lem  links,  if  any,  arc  further  upstream  and  shared  by 
receiver  C  i.e.  for  the  puipose  of  local  recovery  A.B 
and  C  should  be  aggregated  together  and  treated  as 
belonging  to  the  same  loss  recovery  group. 

4  Locating  the  bottlenecks  in  a 
multicast  tree 

Combining  the  information  obtained  by  the  bottle¬ 
neck  estimation  algorithm  (Section  2)  and  the  tree 
inference  algorithm  (Section  3)  it  is  possible  to  nar¬ 
row  down  the  possible  locations  of  the  bottlenecks 
in  the  multicast  free. 

Receivers  appeal-  as  leaves  in  the  reconstructed 
logical  tree.  Section  2  gives  us  an  estimate  of  the 
bottleneck  bandwidth  between  the  source  and  each 
leaf  node.  The  bottleneck  bandwidth  seen  by  each 


Figure  7:  The  bottleneck  bandwidth  seen  by  each 
interior  node  equals  at  least  the  maximum  of  the  es¬ 
timate  of  its  downstream  nodes 


interior  node  is  at  least  equal  to  the  maximum  of  the 
bottleneck  bandwidth  estimates  seen  by  each  of  its 
downstream  receiver  nodes. 


This  can  be  easily  understood  by  the  simple  ex¬ 
ample  in  Figure  7.  Node  D  has  to  see  a  maximum 
rate  of  at  least  100Kbps  in  order  for  receiver  B  to 
see  a  bottleneck  rate  of  100Kbps.  This  implies  that 
the  bottleneck  limiting  the  rate  seen  by  receiver  A 
lies  along  the  branch  AD.  Similarly  node  E  has  to 
see  a  rate  of  at  least  1Mbps  and  hence  the  bottle¬ 
neck  seen  by  receiver  B  lies  some  where  along  the 
path  ED  —  DB.  We  cannot  narrow  down  the  lo¬ 
cation  of  the  100Kbps  bottleneck  link  any  further 
because  having  removed  link  DA  from  considera¬ 
tion  we  are  left  with  the  same  case  as  the  unicast 
path  and  hence  we  cannot  obtain  a  more  precise  es¬ 
timate  using  only  information  obtained  at  the  end 
hosts.  However,  knowing  that  the  bottleneck  link 
lies  somewhere  along  the  path  from  E  to  B  is  suffi¬ 
cient  for  schemes  that  do  not  enlist  router  support 
because  all  receivers  downstream  from  B  would 
share  the  same  bottleneck  in  any  case  irrespective 
of  which  component  link  along  the  path  constitutes 
the  bottleneck  and  hence  knowing  the  exact  location 
of  the  bottleneck  does  not  provide  us  with  any  more 
useful  information. 


[1  -  Fanoutmax ].  These  child  nodes  arc  added 
to  the  tree  as  level  1  nodes. 


Figure  8:  Implementation  modules 


5  Implementation  and  Testing 

5.1  Implementation 

The  implementation  modules  for  the  algorithms  de¬ 
scribed  in  the  previous  sections  arc  shown  in  Figure 
8.  The  topology  generated  by  the  random  tree  gen¬ 
erator  is  constructed  in  the  VINT  network  simulator, 
ns  [12].  Probe  traffic  is  sent  out  by  the  source  (root) 
of  the  tree.  Cross  traffic  (FTP,  Telnet  and  Constant 
Bit  Rate/UDP  )  is  generated  to  try  and  simulate  the 
cross  traffic  that  could  cause  queuing  delays  that  ap¬ 
peal-  as  noise  in  the  measurements  made  at  the  re¬ 
ceiving  end. 

5.1.1  Random  Tree  Generator 

The  random  tree  generator  module  outputs  a  bee 
topology  which  is  used  to  test  the  bottleneck  es¬ 
timation  and  tree  inference  algorithms.  Input 
parameters  to  this  module  are:  maximum  num¬ 
ber  of  nodes  ( Nodesmax ),  maximum  number  of 
leaf  nodes  (Leaves  max)  and  the  maximum  fan-out 
( Fanoutmax )  of  every  node  in  the  tree.  The  tree  gen¬ 
erator  algorithm  works  as  follows  : 

•  Initially  the  tree  consists  of  only  the  root  at 
level  0.  The  number  of  children  generated  by 
the  root  is  chosen  at  random  from  the  range 


Each  newly  added  child  node  can  be  either 
a  leaf  or  an  interior  (non-leaf)  node.  A 
node  may  be  a  leaf  node  with  probability 

NodesZll -Nodescurrent  ’  W^ere  Leavescurrent 
and  Nodes current  are  the  number  of  leaves  and 
nodes  in  the  tree  so  far.  Thus,  as  we  approach 
the  desired  number  of  nodes  in  the  tree,  nodes 
have  a  high  probability  of  being  leaf  nodes. 
If  Nodesmax  =  Nodes  current  then,  a  node  is 
a  leaf  node  with  probability  1.  These  heuris¬ 
tic  rules  ensure  that  the  tree  generation  process 
terminates. 


•  An  interior  node  adds  child  nodes  to  the 
next  level  in  the  tree.  The  number  of 
child  nodes  generated  by  an  interior 
node  is  selected  at  random  in  the  range 

[1  ,min(Fanout 

max  i  A  odes  max  A  odescurrent)] 

In  this  way,  stalling  from  the  root,  child  nodes 
are  added  to  successive  levels  in  the  tree.  The 
process  terminates  when  the  lowest  level  in 
the  tree  has  only  leaf  nodes. 


5.2  Testing 

In  order  to  quantify  the  performance  of  our  bottle¬ 
neck  estimation  and  tree  inference  algorithms,  we 
augmented  the  implementation  modules  in  Figure  8 
with  two  test  modules.The  tree  comparator  mod¬ 
ule  compares  the  original  tree  topology  generated 
by  the  random  tree  generator  with  the  inferred  tree 
topology.  The  bottleneck  comparator  module  com¬ 
pares  the  estimated  bottleneck  bandwidths  with  the 
actual  ones.  We  have  conducted  experiments  to  test 
the  bottleneck  estimation  and  tree  inference  algo¬ 
rithms.  Our  results  are  described  in  the  following 
sections. 


5.2.1  Bottleneck  Estimation 

We  generated  50  data  traces  in  ns.  For  each  trace 
the  following  parameters  were  varied  either  singly 
or  in  combination  with  others: 

•  Topology  of  the  multicast  distribution  free. 

•  Link  Bandwidths. 


Results  of  estimation 

No.  of  estimates 

Estimate  within  1  %  of  exact  value 

296 

No  estimate  due  to  insufficient 

number  of  valid  samples 

12 

Incorrect  estimate 

4 

Total  number  of  estimates 

312 

Table  1:  Results  of  Bottleneck  Estimation 


avg(#  nodes)=10.5 
avg(#  nodes)=22.2 
avg(#  nodes)=46 


log2(#pkts  sent) 

Figure  9:  Binary  trees:  individual  link  loss  rate  is  se¬ 
lected  at  random  in  the  range  [0%,10%] 

•  Location  of  bottlenecks  (e.g.  towards  leaves  , 
near  the  root  etc). 

•  Amount/Type  and  duration  of  cross  traffic. 

•  Size  of  bunches  of  probe  packets  ( pairs,  threes 
or  fours  ). 

•  Run  time  (  which  affects  the  number  of  gath¬ 
ered  samples  ). 


avg(#  nodes)- 10.5 
avg(#  nodes) =22. 3 
avg(#  nodes)=45.9 


Figure  10:  Binary  trees:  individual  link  loss  rate  is 
selected  at  random  in  the  range  [0%,5%] 

5.2.2  Tree  Inference  Algorithm 

Figures  9  amd  10  plot  our  test  results  for  binary 
frees.  We  plot  the  probability  of  correctly  inferring 
the  free  for  an  increasing  number  of  packets  trans¬ 
mitted  at  the  source.  As  the  number  of  transmit¬ 
ted  packets  increases,  the  number  of  loss  samples 
collected  at  the  receiver’s  end  increases  and  the  ap¬ 
proximated  probability  of  seeing  true  shared  losses 
approaches  its  true  value.  We  would  thus  expect  the 
probability  of  inferring  the  correct  tree  to  approach 
unity  as  the  number  of  collected  samples  increases. 
Figures  9  and  10  plot  the  observed  results  for  dif¬ 
ferent  sized  frees  with  the  link  loss-rates  selected  as 
a  uniform  distribution  within  a  selected  range.  We 
see  that  the  observed  probability  of  correctly  infer¬ 
ring  the  tree  does  in  fact  converge  towards  one  with 
an  increasing  number  of  transmitted  packets. 


For  each  trace  the  bottleneck  bandwidth  seen 
by  each  receiver  was  calculated.  The  results  arc 
tabulated  in  Table  1.  Our  tests  do  not  cover  the 
entire  range  of  possible  test  conditions.  Further, 
the  tests  are  restricted  to  a  simulation  environment 
which  differs  from  actual  Internet  conditions.  Our 
approach  to  bottleneck  bandwidth  estimation  needs 
to  be  tested  on  the  Internet  in  order  to  quantify  its 
performance  under  realistic  network  traffic  condi¬ 
tions. 


6  Related  Work 

Bolot  used  a  stream  of  packets  sent  at  fixed  intervals 
to  probe  several  Internet  paths  in  order  to  char  acter¬ 
ize  delay  and  loss  behavior  [1],  In  [6],  the  author 
proposes  a  “packet-pair”  scheme  to  determine  the 
bottleneck  service  rate  and  uses  this  to  develop  a 
rate-based  flow  control  scheme.  Keshav’s  work  is  in 
the  context  of  unicast  traffic  and  assumes  a  round- 
robin-like  queue  service  discipline.  [2]  describes 


the  implementation  of  BPROBE,  a  tool  which  pro¬ 
vides  an  estimate  of  the  uncongested  bandwidth  of 
a  path  by  sending  a  series  of  ICMP  echo  pack¬ 
ets  from  source  to  destination  and  measuring  the 
inter-arrival  times  between  successive  packets  at  the 
source.  [14,  15]  displays  the  fundamental  limita¬ 
tions  of  sender-based  packet  pair  techniques  and 
advocates  receiver  based  techniques.  Paxson  also 
points  out  the  failure  of  packet-pair  techniques  in 
the  face  of  multi-channel  bottlenecks  and  general¬ 
izes  the  receiver-based  packet  pair  (RBPP)  mech¬ 
anism  to  propose  a  significantly  more  robust  pro¬ 
cedure,  “packet-bunch  modes”  (PBM)  which  is  es¬ 
sentially  based  on  sending  bunches  of  probe  packets 
and  varying  the  bunch  size  keeping  in  mind  the  pos¬ 
sibility  of  finding  more  than  one  bottleneck  value. 

[17]  proposes  a  loss-delay  based  adjustment  al¬ 
gorithm  for  adapting  the  transmission  rate  of  mul¬ 
timedia  applications  to  the  congestion  level  of  the 
network.  The  authors  estimate  the  bottleneck  band- 
widths  within  the  multicast  tree  in  order  to  dynami¬ 
cally  determine  the  adaptation  parameters.  Estima¬ 
tion  of  the  bottlenecks  is  done  by  enhancing  RTP 
with  the  packet  pair  approach.  The  filtering  mecha¬ 
nism  used  is  similar  to  that  adopted  in  BPROBE. 

Route  tracing  tools  developed  so  far  exploit 
certain  features  within  the  routers  in  order  to  infer 
the  path  from  source  to  destination.  The  traceroute 
tool  built  by  Van  Jacobson  discovers  the  path  be¬ 
tween  a  source  and  receiver  of  unicast  traffic  by  us¬ 
ing  the  ttl  field  of  an  IP  packet  header  to  force  inter¬ 
mediate  routers  to  send  an  error  indication  (ICMP 
time  exceeded)  packet  back  to  the  source  thus  ex¬ 
posing  the  routers  within  the  network  to  discover  the 
path  between  the  source  and  receiver.  The  pathchar 
tool,  also  developed  by  Jacobson,  estimates  the 
bandwidth,  delay,  average  queue  and  loss  rate  of  ev¬ 
ery  hop  between  any  source  and  destination  on  the 
Internet.  Pathchar  uses  the  same  basic  technique  as 
traceroute  and  measures  the  time  between  the  trans¬ 
mission  of  an  IP  packet  from  the  source  and  the  re¬ 
turn  of  the  corresponding  ICMP  packet  from  an  in¬ 
termediate  router.  Analysis  of  the  timing  data  re¬ 
veals  the  characteristics  of  each  link  along  the  path. 

Estimation  of  the  topology  of  the  multicast  tree 
can  be  done  using  the  tool  “mtrace”.  mtrace  discov¬ 
ers  the  multicast  path  from  a  source  to  a  receiver 
using  an  MTRACE  tracing  feature  implemented  in 


multicast  routers  that  is  accessed  as  an  extension  to 
the  IGMP  protocol.  A  trace  query  is  passed  hop- 
by-hop  along  the  reverse  path  from  the  receiver  to 
the  source,  collecting  hop  addresses,  packet  counts 
and  routing  error  conditions  along  the  path,  and  re¬ 
turning  the  response  to  the  requestor  as  a  standard 
unicast  packet.  The  Tracer  protocol  [8]  uses  the 
same  MTRACE  router  function  in  order  to  organize 
the  receivers  of  a  multicast  group  deterministically 
into  a  logical  tree  structure  in  order  to  achieve  ef¬ 
fective  error  recovery  and  congestion  control.  In 
Tracer  each  receiver  sends  an  MTRACE  query  to 
the  source  of  the  tree.  With  the  existing  implemen¬ 
tation  of  MTRACE,  this  could  cause  scaling  prob¬ 
lems  due  to  an  implosion  of  MTRACE  queries  to¬ 
wards  the  source.  Further  this  places  a  heavy  load 
on  the  source  which  has  to  unicast  replies  back  to 
every  receiver.  In  order  to  improve  the  efficiency  of 
tracing  in  Tracer,  Levine  et  al  propose  the  addition 
of  source-based  multicast  tracing  to  IGMP. 


7  Conclusions 

In  this  paper,  we  presented  algorithms  that  allow  a 
receiver  to  infer  the  logical  topology  of  the  multi¬ 
cast  free,  the  bottleneck  bandwidth  of  the  path  be¬ 
tween  the  source  and  each  receiver  in  the  tree  and 
the  approximate  location  of  the  bottlenecks  in  the 
free.  These  algorithms  attempt  to  answer  the  ques¬ 
tion:  how  much  topological  information  can  a  re¬ 
ceiver  in  a  multicast  free  glean  using  only  informa¬ 
tion  that  is  readily  available  at  the  the  end-hosts  with 
the  existing  IP  Multicast  service  model? 

Through  the  use  of  an  IP  group  address  the 
IP  multicast  service  provides  a  ’’level  of  indirec¬ 
tion”  on  account  of  which  receivers  and  senders 
need  not  know  about  each  other.  While  this  re¬ 
ceiver  anonymity  allows  multicast  sessions  to  scale 
to  large  sizes,  potentially  useful  information  is  lost 
in  the  process.  Mechanisms  that  allow  group  mem¬ 
bers  to  reconstruct  this  lost  information  arc  thus  use¬ 
ful.  The  algorithms  presented  here  arc  a  step  in  this 
direction. 
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